Generating Search Term Variants for Text Collections with Historic Spellings

نویسندگان

  • Andrea Ernst-Gerlach
  • Norbert Fuhr
چکیده

In this paper, we describe a new approach for retrieval in texts with non-standard spelling, which is important for historic texts in English or German. For this purpose, we present a new algorithm for generating search term variants in ancient orthography. By applying a spell checker on a corpus of historic texts, we generate a list of candidate terms for which the contemporary spellings have to be assigned manually. Then our algorithm produces a set of probabilistic rules. These probabilities can be considered for ranking in the retrieval stage. An experimental comparison shows that our approach outperforms competing methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discovery of Term Variation in Japanese Web Search Queries

In this paper we address the problem of identifying a broad range of term variations in Japanese web search queries, where these variations pose a particularly thorny problem due to the multiple character types employed in its writing system. Our method extends the techniques proposed for English spelling correction of web queries to handle a wider range of term variants including spelling mist...

متن کامل

Rule-based search in historical text databases - Visualization techniques

The project Rule-Based Search in Historical Databases with Non-Standard Spellings (RSNSR, Pilz et al. 2005) will provide an online-available search-engine that can be used by interested amateurs as well as professional linguists. Parallel to the implementation of a customizable software architecture to support an efficient search functionally recalling all relevant historical spellings of a mod...

متن کامل

Multi-User File System Search

Information retrieval research usually deals with globally visible, static document collections. Practical applications, in contrast, like file system search and enterprise search, have to cope with highly dynamic text collections and have to take into account user-specific access permissions when generating the results to a search query. The goal of this thesis is to close the gap between info...

متن کامل

Phonetic Models for Generating Spelling Variants

Proper names, whether English or non-English, have several different spellings when transliterated from a non-English source language into English. Knowing the different variations can significantly improve the results of name-searches on various source texts, especially when recall is important. In this paper we propose two novel phonetic models to generate numerous candidate variant spellings...

متن کامل

Comparison of LVG and MetaMap Functionality

LVG and MetaMap both compute lexical variants but were developed for quite different purposes: LVG’s raison d’être is lexical variant generation whereas MetaMap’s main purpose is to map text to corresponding concepts in the UMLS® Metathesaurus (Meta), one of the UMLS knowledge sources. Besides generating lexical variants, LVG has the subsumed ability to normalize words and the supplementary abi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006